This assignment is for ETC5521 Assignment 1 by Team Grevillea comprising of Samuel Lyubic and Brendi Ang.

Introduction

The Tour de France is a cycling tournament that is held in France annually with the winner being the rider with lowest elapsed time after the completion of every stage which spans across 21 stages over 23 grueling days. It is considered one of, if not, the most prestigious race that elite cyclists can partake in given the level of difficulty of riding through the ranging beautiful French landscapes (EEB (2020)). Despite being an individual event, riders travel through the whole country going through the scenic French countryside in teams of 8 in order to strategically position the team leader in the best position to win the general classifications. Teams work with their rider to reduce the physical strain of the course, strategically target particular parts and block certain riders from advancing among many other coordinated efforts for the overall benefit of their lead rider (EngberD (2005)). The teams would pass through long country side roads, steep alpine regions and tight city areas, covering terrain ranging from rolling hills, long flat grounds and steep mountainous incline in their quest for victory. The types of terrain are dispersed across the different stages over the 23 days with the breakdown usually consisting of flat stages, mountain stages, hilly stages and time trial stages. The division of the stages and the combination of terrains across the Tour are one of the reasons the Tour is considered so grueling and one of the greatest difficulties for riders and teams to overcome as each type of terrain usually fits a rider with specific compositions however, in order to be victorious they must be able to perform across all the terrains and plan their strategy to control the race effectively (Rickman (2019)). Given the chess-like strategy that runs tandem with the elite physical attributes and skills required to win the general classification, this report will do a deep analysis into the Tour de France historical data to look at the past results, riders, teams and potential winning blueprint to answer the primary question: “What are the most important characteristics and attributes to win the tour and how have they evolved over time?” with a number of follow up questions that will inspect the range of variables associated with winning the Tour de France in oder to ultimately explain what it takes to come out victorious.

History of Tour De France

Tour De France (“Le Tour”) is an annual men’s bicycle race that in modern days encompasses a 21-day course that covers approximately 3,500km. It is predominantly held in France while often passing through other countries. Cyclists from around the world gather for a chance to win the prestigious ‘Tour De France’ trophy, along with a cash prize of €550,000 (Encyclopedia Britannica (2020)).

The tour is typically split into four various stages — flat, hilly, mountain, individual and team time trials. One stage is performed per day and each stage has a winner, and the rider that completes the most stages in the shortest amount of time wins the overall title.

The Jersey Hierarchy

The colours and patterns of jerseys are some things one should look out for in every race. As a brief overview, the same coloured jerseys are worn within teams, current road champions get to wear their team jerseys with their country’s colour& reigning world champions wears their team jerseys with stripes (Encyclopedia Britannica (2020)). The jersey hierarchy, which is based on the how old the competition has been implemented, serves as a symbolic purpose:

Yellow: General Classification (Race leader): The “General Classification” is the oldest and main competition sought after by competitors, where the winner of this will win the overall title. After each stage, the rider with the lowest aggregate time gains the privilege to wear the most coveted jersey in the race.

Red& white polka-dot: Mountains Classification (Best climber): With stages containing climbs, points are awarded to riders who reach the summit first. Thus, this is worn by the rider who at the start of each stage, has the largest number of climbing points.

Green: Points classification - Top sprinter: Worn by rider who has the greatest number of points at the start of each stage, where points are given to first 15 riders to finish a stage. Furthermore, additional points are also given to first 15 riders to cross a pre-determined ‘sprint’ point in each stage (Encyclopedia Britannica (2020)). These jerseys are usually worn by ‘sprinters’.

White: Young rider classification - Most impressive rider, under 26 years old: Determined the same as general classification, where it is restricted to riders that are under 26 years old.

Teams

You may ask, why have teams when there is only one individual champion for the General Classification? ‘Domestiques’ (french in servant), a term frequently used to describe riders who support a team leader to win stages and accumulate the least elapsed time at the end of the race. For instance, to set up a rider for a sprint, one rider rides at a high speed, with the team’s sprinter following behind to cut wind resistance and conserve energy (Prinz and Wicker (2012)); This is known as ‘Lead-out Trains’. Teams typically consists of 8 riders and are denoted by numbers, where the team leader wears the lowest number.

Tour De France Data

Getting the Data (and Debugging Process)

Initially, we downloaded all the data set (.csv format) directly from the ‘tidytuesday github link’, which was the most convenient way to load and read-in the Tour De France data set. However, we discovered errors in the time and elapsed variables as the lubridate::period objects were missing and incorrect. Furthermore, tdf_stages data set only provided data till the 2017 edition while the actual dataset covered all the way to 2019. Thus, we opted to use the cleaning script found in the same GitHub link.

Accordingly, our debugging process includes renaming inconsistent variable names, extracting components of date-time objects and using regular expressions to fix the structure of character strings to reproduce the same data set that was intended. Further, we web scraped the data from Wikipedia (Wikipedia contributors (2020) and Wikipedia contributors (2020)) to obtain the stages data set for 2018 and 2019. These Wiki pages corresponds to how the actual data set obtained its data; Thus binding the data was straightforward as the structure the data was analogous with the tdf_stages data set.

Data Description

The data used for this analysis originates from the tdf R package, which is essentially a container for the editions data frame. This data set, along with its cleaning script are hosted on GitHub and is readily available. This data set encompasses the historic data about Tour de France riders, stages and winners.

## Rows: 106
## Columns: 19
## $ edition       <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, …
## $ start_date    <chr> "1903-07-01", "1904-07-02", "1905-07-09", "1906-07-04",…
## $ winner_name   <chr> "Maurice Garin", "Henri Cornet", "Louis Trousselier", "…
## $ winner_team   <chr> "La Française", "Conte", "Peugeot–Wolber", "Peugeot–Wol…
## $ distance      <dbl> 2428, 2428, 2994, 4637, 4488, 4497, 4498, 4734, 5343, 5…
## $ time_overall  <dbl> 94.55389, 96.09861, NA, NA, NA, NA, NA, NA, NA, NA, 197…
## $ time_margin   <dbl> 2.98916667, 2.27055556, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ stage_wins    <int> 3, 1, 5, 5, 2, 5, 6, 4, 2, 3, 1, 1, 1, 4, 2, 0, 3, 4, 4…
## $ stages_led    <int> 6, 3, 10, 12, 5, 13, 13, 3, 13, 13, 8, 15, 2, 14, 14, 3…
## $ height        <dbl> 1.62, NA, NA, NA, NA, NA, 1.78, NA, NA, NA, NA, NA, NA,…
## $ weight        <int> 60, NA, NA, NA, NA, NA, 88, NA, NA, NA, NA, NA, NA, NA,…
## $ age           <int> 32, 19, 24, 27, 24, 25, 22, 22, 26, 23, 23, 24, 33, 30,…
## $ born          <chr> "1871-03-03", "1884-08-04", "1881-06-29", "1879-06-05",…
## $ died          <chr> "1957-02-19", "1941-03-18", "1939-04-24", "1907-01-25",…
## $ full_name     <chr> NA, NA, NA, NA, "Lucien Georges Mazan", "Lucien Georges…
## $ nickname      <chr> "The Little Chimney-sweep", "Le rigolo (The joker)", "L…
## $ birth_town    <chr> "Arvier", "Desvres", "Paris", "Moret-sur-Loing", "Pless…
## $ birth_country <chr> "Italy", "France", "France", "France", "France", "Franc…
## $ nationality   <chr> " France", " France", " France", " France", " France", …

winners corresponds to a condensed version of the editions data frame provided by the tdf package. Each row corresponds to an edition of the Tour De France event for a given year, spanning all the way to the first edition in 1903 to the latest tournament as of today (2019). Some of the variables include:

  • winners variables contains the overall winner’s biographical data and information about the race for a given edition, which comprises of:
  • start date start of the Tour De France edition
  • winner name name of the overall winner (*i.e. winner of the General Classification)
  • distance aggregate route distance (in km) covered by the entire race
  • time margin difference between winning time and runner up (in hours)
  • stage_wins stages won by the overall winner
  • stages led number of stages spent in the yellow jersey
  • height overall winner’s height (in meters)
  • weight overall winner’s weight (in kilograms)
  • born overall winner’s date of birth
  • full_name overall winner’s full name
  • nationality overall winner’s nationality
## Rows: 2,278
## Columns: 8
## $ Stage          <chr> "1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11…
## $ Date           <chr> "2017-07-01", "2017-07-02", "2017-07-03", "2017-07-04"…
## $ Distance       <dbl> 14.0, 203.5, 212.5, 207.5, 160.5, 216.0, 213.5, 187.5,…
## $ Origin         <chr> "Düsseldorf", "Düsseldorf", "Verviers", "Mondorf-les-B…
## $ Destination    <chr> "Düsseldorf", "Liège", "Longwy", "Vittel", "La Planche…
## $ Type           <chr> "Individual time trial", "Flat stage", "Medium mountai…
## $ Winner         <chr> "Geraint Thomas", "Marcel Kittel", "Peter Sagan", "Arn…
## $ Winner_Country <chr> "GBR", "GER", "SVK", "FRA", "ITA", "GER", "GER", "FRA"…

The editions dataset contains variable stage_results, which represents a list of lists where each element contains a list of stage results pertaining to a particular year or edition of Tour De France. When we unnest, or flatten it back to its regular columns, we get tdf_stages and stage_clean dataset. The tdf_stages data set depicts the details of the stage winners at each stage of an edition, with stage type/terrain, the distance between the start location and end location of the stage. Some of the variables includes:

  • stage: Stage number of the edition
  • date: Date of stage
  • origin and destination: Start and end location of the stage
  • distance: Distance (in KM) for the stage
  • type: type of stage
  • winner: Name of the stage winner
## Rows: 255,752
## Columns: 11
## $ edition          <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ year             <int> 1903, 1903, 1903, 1903, 1903, 1903, 1903, 1903, 1903…
## $ stage_results_id <chr> "stage-1", "stage-1", "stage-1", "stage-1", "stage-1…
## $ rank             <chr> "1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "…
## $ time             <chr> "17H 45M 13S", "55S", "34M 59S", "1H 2M 48S", "1H 4M…
## $ rider            <chr> "Garin Maurice", "Pagie Émile", "Georget Léon", "Aug…
## $ age              <int> 32, 32, 23, 20, 36, 37, 25, 33, NA, 22, 26, 28, 21, …
## $ team             <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ points           <int> 100, 70, 50, 40, 32, 26, 22, 18, 14, 10, 8, 6, 4, 2,…
## $ elapsed          <chr> "17H 45M 13S", "17H 46M 8S", "18H 20M 12S", "18H 48M…
## $ bib_number       <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …

The stage_clean is expands on the tdf_stages dataset, including results of all riders and their ranks at every stage where each row corresponds to the result of a particular rider such as points accumulated for the stage. This data set includes variables such as:

edition: Race edition pertaining to the year rider: The name of the rider/participant age: Age of rider at that point in time rank: Rank of the rider for the stage elapsed: Time elapsed for the stage

  • The tdf_winner data comes from tdf_winners.csv. The data contains information about 106 winners of the Tour de France from 1903 to 2019. The variables are showing in the Table 1.
Table 1: Variables in tdf_winners data
Variable Class Description
edition integer Edition of the Tour de France
start_date double Start date of the Tour
winner_name character Winner’s name
winner_team character Winner’s team (NA if not on a team)
distance double Distance traveled in KM across the entire race
time_overall double Time in hours taken by the winner to complete the race
time_margin double Difference in finishing time between the race winner and the runner up
stage_wins double Number of stage wins (note that it is possible to win the GC without winning any stages at all)
stage_led double Stages led is the number of stages spent as the race leader (wearing the yellow jersey) by the eventual winner
height double Height in meters
weight double Weight in kg
age integer Age as winner
born double year born
died double Year died
full_name character Full name
nickname character Nickname
birth_town character Birth town
birth_country character Birth country
nationality character Nationality
  • The stage_data data comes from stage_data.csv. The data contains ranking information for each stage of the annual race. The variables are showing in the Table 2.
Table 2: Variables in stage_data data
Variable Class Description
edition integer Race edition
year double Year of race
stage_results_id character Stage ID
rank character Rank of racer for stage
time double Time of racer
rider character Rider name
age integer Age of racer
team character Team (NA if not on team)
points integer Points for the stage
elapsed double Time elapsed stored as lubridate::period
bib_number integer Bib number
  • The stage_data data comes from stage_data.csv. The data contains information of each stage for the annual race. The variables are showing in the Table 3.
Table 3: Variables in tdf_stages data
Variable Class Description
Stage character Stage Number
Date double Date of stage
Distance double Distance in KM
Origin character Origin city
Destination character Destination city
Type character Stage Type
Winner character Winner of the stage
Winner_Country character Winner’s nationality

Analysis and findings

Time and place for speed

The following section will be analyzing the overall winners speed and how it changes from stage to stage relative to the rest of the field in order to understand when winning riders believe its most vital to increase their pace over their competitors and increase the elapsed time gap between them. In order to conduct this analysis the rider average speeds in km/h were calculated for each terrain type across every stage. The calculation was done by dividing the elapsed time over the total distance variables from the stage_clean and tdf_stages dataset’s. The following analysis will be focusing on the Tour’s since 1988 as this is known as the beginning of the EPO era for relativity, given the extreme performance enhancing nature of doping that became prominent from this point onward due to the difficulty surrounding its detection (F (2013)).

Average stage speeds across the three main terrains for overall winners in km/h, since 1988

Figure 1: Average stage speeds across the three main terrains for overall winners in km/h, since 1988

Figure 1: Average stage speeds across the three main terrains for overall winners in km/h, since 1988

Figure 1 illustrates the average stage speed for all overall winning riders since 1988, across all the stages and stage types with stage number on the x axis and the average speed in kilometers per hour on the y axis. Flat, Hilly Stage, Mountain and Time Trial represent the speeds of the overall winners for each of those terrains and The Field represents the rest of the competitors. From the figure it is evident:

Flat Stages - The overall average pace from the winners and the field stays relatively consistent for the duration of the course, with speeds increasing slightly towards the middle and end of the Tour, until the legacy “ChampsEssay” rounds from 20-22. - Overall winners average speed are relatively equal with the average speed of the field for the first 8 stages, with overall winners being 0.3km/hr faster over the first two and 0.34km/h faster in stage 9, which registers the maximum speed for the tour of 43.95km/h for overall winners and 43.56km/h for the field. - The Field eclipsing the overall winners in stage 10 by 0.43km/h. - The combined speeds show a decreasing trend from stage 12 to stage 14, with overall winners having their largest speed difference of 0.96km/h, 0.63km/h and 0.96km/h across those three stages. The field produces marginally higher speeds then the overall winners across if 15 to 17. The overall winners show at stage 18 they make the largest difference in average pace with a speed of 43.64km/h which is 1.3km/h greater then the fields average speed of 42.34km/h for the stage. - The figure would indicate that across flat stages the field and winners are relatively equal however, when the flat types fall on stage 12, 13, 14 and 19 overall winners seem to look to create the largest gap between them and the field and reduce their elapsed time along the flat terrain.

Mountain stage types - The combined average speed for both overall winners and the field have a slightly increasing trend from rounds 2 to 5 and then are take a declining trend for the rest of the tour with a few up ticks in speed from round 6 to 8 and from 11 to 13. The maximum speed for over all winners is 41.85k/h and 40.56km/h for the field, which is recorded in stage 5. - Overall winners show to eclipse the fields average pace at every mountain stage of the Tour. The largest gaps are present across the middle stages from 6 to 10, 12 to 15 and 17 to 20 with the largest difference being 2.04km/h in stage 9, 1.99km/h in stage 12, 2.13km/h in stage 13 and the largest difference in stage 20 of 2.18km/h - The figure indicates that when mountain terrain type’s fall on stages 6 to 10, 12 to 15 and 17 to 20, overall winners look to attack and create greater elapsed time separation from the field at these points. All these stage fall along the declining pattern of the overall speed trend of the mountain stages, indicating that the overall winners target harder stages to impose their presence and ability to create the largest gap between their pace and the field in order to try and and create a larger separation with their elapsed time.

Hilly Stages - The combined pace shows similar variation for both overall winners and field across the first three rounds. This is followed by four different periods of declining speeds trends whereby combined speed declines from stage 3 to stage 7 and then increases at stage 8 and then average speeds decline to stage 12, with another increase at 13 and decline to 15 for overall winners, from there they increase in speed, and further decline for the field followed by relatively similar variation for both speeds until stage 20. - Overall winners show to have greater average speed then the field the whole way through the Tour except for stages 10, 17 and 20 where the overall winners speed is marginally eclipsed by the field. The greatest average speed difference come when hilly states fall on 8, 12, 13, 16 and 19 which are 2.36km/h, 1.80km/h, 2.06km/h, 2.33km/h and 2.07km/h, which indicates overall winning riders look to maximize their time difference across these stages when they are hilly types.

Time Trial - Time trial combined average speed follows a relatively declining trend through to round 16 where there is a the lowest average speed since 1988 recorded of 22.53km/hr for overall winners and 20.31km/hr for the field. Speeds increase through to stage 18 and level out there. - The time trial average speeds show that for time trials, the overall winners have the largest speed gap between them and the field, for all stage types. Overall winners hold larger gap for the longest stretch of stages relative to the the differences present in the other three stage types. - This would indicate that overall winners are far more elite then the field at time trials and look likely to minimise their race time along them no matter what stage of the tour the time trial stage falls on. The maximum speed of across the time trials for the overall winners and the field comes when the time trial falls on stage one with speeds of 49,53km/h and 48.13km/h respectively.

0.1 What are the characteristics of those winner riders?

Where does the greatest difficulty lie?

The following section will be analyzing the in-completion rate across the different types of stages. Each stage carries with it it’s own unique difficulties and understanding where riders have historically shown the greatest weakness could be beneficial at understanding stages that should be targeted and where energy needs to be conserved for. In order to conduct this analysis the total number riders over who failed to complete each stage was divided by the total number competitors who have competed in the stage over time. The stage_clean dataset lists riders who “did not finish”, “over the time limit” and “not qualified” as “DNF”, “OTL” and “NQ” as their rank for the stage, these ranks were tallied up for every stage as well as the participants for every stage and then divided over each other to produce the percentage rate of in-completion for each stage.

The percentage of riders that have not finished the main for stages, since 1969

Figure 2: The percentage of riders that have not finished the main for stages, since 1969

Figure 2 visualises the percentage of total riders that have failed to complete the specific stage types with the stage type on the x axis and the percentage on the y axis. The Mountain and Hilly stages show to be the hardest terrains for riders to finish. Percentage of riders who have failed to complete each stage are:

  • Mountain stage: 2.13% of all riders that have entered the stage have failed to complete it
    • Hilly stage: 1.61% of all riders that have entered the stage have failed to complete it
    • Flat stage: 0.61% of all riders that have entered the stage have failed to complete it
    • Time trial: 0.29% of all riders that have entered the stage have failed to complete it

Time trial’s have the least percentage of failed to complete riders with .29% and Flat stages being over double that. Time trial stages are run individually and are usually shorter distances on most often on more flatter terrain these characteristics can make it safer with reduced physical risk and collision risk. This may show that time trials are vital stage to master given their low in-completion rate which could be due to their safer conditions, which could allow for skilled riders to maximise their abilities and use these stages to improve their time gap with competitors. Furthemore, given mountain stages and hilly stages show higher incompletion rates, it would indicate that tailoring training and strategy to these stages could be beneficial in the outcome of a riders race as well as, given the greater historical difficulty relative to the other stages as shown by the higher rate, these could be stages where a rider looks to target the competition by making these stages a strength in order to increase their position when other riders may struggle more.

Are elite riders similar?

The following section will be analyzing the different body compositions of overall winners compared with the elite stage riders, to assess if their are similarities or differences among the different elite rider styles and what stage type body type may be best suited for overall victory. The stage type body types have been broken down into climber for elite mountain stage riders, sprinter for elite flat ground stage riders and individual time trial for elite individual time trial riders. The elite stage riders were identified by aggregating the individual stage winners and identifying the riders with most wins for each stage type. The riders were selected from where the largest gap fell to the next rider, for climbers and time trialist’s this included six selection and 5 for sprinters however, for overall winners the upper bunch of multiple time winners only consisted of 3 observations so all the winners since 1988 were shown. The physiological data was not provided by the data set being used in order to conduct the analysis each riders height and weight details were sourced from cyclingranking.com ((“Cyclingrank.com” 2020)) and input into tibbles and then combined for the analysis to take place.

Comparison of overall winner, sprinter and climbers body types

Figure 3: Comparison of overall winner, sprinter and climbers body types

Figure 3 visualizes the body types of climbers, time trialist, sprinters and the overall winner with weight in kg along the x axis and height in meters on the y axis. The figure illustrates that elite individual time trialist’s and sprinters have relatively similar body compositions with most riders falling in the upper right quadrant. Elite time trialists show most of their observations between 75kg and 76kg with a height between 1.83m and 1.86m, while sprinters show most of their observations between the height of 1.83m and 1.84m and 75 to 80g. Climbers and overall winners take a somewhat similar cone shape, with a relatively similar spread of weight however, climbers tend to be shorter then some overall winners. Overall winners mostly weighted between 65 and 75kg with their height varying between 1.72m and 1.90

Figure 3 shows that climbers have the most similar body type to over all winners which could indicate that have a lower weight could be more beneficial for general classification victory at the Tour, with height not showing to overly impact results according to this figure.

Which way is age going?

Age often plays a big factor in the assessment of an athlete potential performance. With the developments in the medical, technological and scientific fields athlete’s are seeming to continue to compete for longer. The following section will be analyzing the age of all the competitors in the Tour de France and how it has changed over time. The average age for competitors was collectively calculated for each year starting from 1988.

The averge age of riders at each Tour de France

Figure 4: The averge age of riders at each Tour de France

Figure 4 visualizes the average ages of contestants from 1988 to 2020 with the year on the x axis and the average age on the y axis. The figure shows the average age has been increasing over time going from an average age of 27.1 in 1998 to a peak of 29.67 in 2012 and ending up with an average age of 29.22 at the 2019 Tour. The advancement of cycling technologies and development of equipment such as lighter frames and more aerodynamic attire (Steve (2009)) in conjunction with improvements in best practice and integration of sports science, could be a reason behind riders extending their career as physical ailments that would have halted riders in the past are now being dealt with more effectively and proactively (M (2016)) thus allowing riders to extend their careers.

How has age impacted overall results?

Age of overall winners

Figure 5: Age of overall winners

Figure 5 visualises the age of the overall winner since 1988 with the date on the x axis and the age on the y axis. A slight upward trend is present with quite a bit of variability in amongst the time frame which indicate that the age of winning riders is increasing however, it does not seem to be the most critical element given the distribution on both sides of the trend line. It is also worth noting that, given the EPO era that this time frame is inspecting, the increased riding/winning age could also be influenced by illicit substance use, as was notoriously the case with Lance Armstrong (J (2013)) who won seven straight tours from 1999 aged 27 through to 2005 aged 32 as well as less well known riders such as Bjarne Liis who won 1996 aged 32 and also admitted to doping (S (2020)) as well as Miguel Indurain who won 5 tours in a row from 1991 to 1995 (FotheringhamA (2017)). These substance based results are the ones that are known and there could potential be others which may have influenced the data set and skewed the results.

Although Figure 5 shows a slight upward trend in winning age, the three diagonal lines present are related to three winners who won consecutively with two of them being flagged for doping and the use of the performance enhancers is most likely the reason why they were able to win one, let alone that many in a row thus skewing the age trend upward fallaciously. Overall, this would indicate that being younger or older may not play the largest factor in the chance to claim victory at the Tour de France.

Is being part of team important?

Given the strategic nature and benefits that riders gain from their team, the following section will be analyzing if being part of a particular team may make a rider more likely to win the Tour given how prevelant that team produces a winner. The data set will be focusing on winners from 1969 given this was the year that professional teams were re-introduced into the Tour de France as previously mentioned. In order to assess the most prevalent teams in Tour de France, the data set was filtered to show results from 1969 which saw the re-introduction and professional sponsorship of teams ((“1969 Tour de France” 2020)).

<img src=“index_files/figure-html/most-prevelant-teams, fig.cap”Displays the teams with the most wins since 1969“-1.png” width=“672” style=“display: block; margin: auto;” />

Figure ?? displays the most prevalent teams in Tour history with the teams on the x axis and the number of wins on the y axis. This figure shows 5 teams which have dominated the results and collectively represented 60% of the winners since 1969. Specifically, Team Ineos who are considered the most dominant team in Tour history credit their performance to improved training habits and implementation of recovery techniques, evolving technologies and marginal gains across the board which collectively add up to an overall better performance across the team. Furthermore, the financial backing of these larger teams allow them to attract top tier riders, coaches and staff given the pay packets they are able to provide as well as being able to afford all the the state of the art practices that assist performance thus potentially improving their dominance relative to smaller teams ((“What Makes Ineos Unbeatable?” 2019)). This may indicate that a rider may have a better chance of winning the Tour de France depending on the team they choose/are chosen for.

Does it matter where you’re from?

Different countries have different characteristics and elements that shape athletes in different ways, the following section will look to analyse if particular countries are producing more winners and how the country a winner is from has changed over time.

Figure displaying a time series of the countries each winner is from

Figure 6: Figure displaying a time series of the countries each winner is from

Figure 6 visualises the distribution of countries each winner has been from with the year of the Tour on the x axis and the nationality on the y axis. The figure shows that Up until 1985 France and Belgium were the two most dominant countries from which winning riders were from. In 1986 the U.S. saw their first win and the most dominant nations since then have been the U.S. Spain and Great Britain. This could be a result of globilisation of the sport and access and mobility with regards to training and travel, making it more accessible for riders, coaches and teams to train and operate in the most desired regions with the highest caliber of trainers. However, it should be noted this change comes around the time of the “doping era” which clearly has a large factor in the wins tallied by the United States. Given the freedom of movement and access to terrains and training being accessible by most athletes, it would still seem a few countries dominate the winning column each year which could indicate where the best training partners and highest level of performance and training takes place.

Quest for the Yellow Jersey

Procedure

The purpose of this section is to explore the trends of overall winners. We start by answering the most basic question: “Do overall winners win all stages through the edition?”. Here, we discovered that what some overall winners achieved in the early decades have not been achieved in contemporary editions. For instance, wearing the yellow jersey for all stages in the edition. So, we decided to hone in on the last two decades, 2000 - 2020. We studied how overall winners were ranked in each stage and stage type to examine if these riders specialized in any particular domain such as sprinting or climbing. For this section, we utilised regular expressions to manipulate character strings, and relational joins to join data sets with matching observations.

A fallacy: Overall Winners leads all stages

How often do Overall Winners Wear the Yellow Jersey

Figure 7: How often do Overall Winners Wear the Yellow Jersey

Many speculate that overall winners are cyclists who consistently don the yellow jersey by winning all stages. However, in the majority of the editions, the line plot (7) invalidates the conjecture, demonstrating that the stage led by overall winners has been shown to vary greatly across most editions. The plot further reveals that overall winners can lead as little as 1 stage, suggesting that they only led the tour at the very last stage.

Figure 7 further illustrates that staged led by overall winners have decreased in the recent years as compared to earlier era. This circumstance can be attributed to the imbalanced skill set of riders may attribute to this. To elaborate, tour organisers introduced new trophies into the race over the years as a strategic tool to allow for a more vivid contest with multiple opportunities for duels between riders or teams (andreff2016tour). By creating ‘different races within the race’, the number of riders in contention of the yellow jersey have dwindled over the years as other riders fight for these other trophies. The implementation of new trophies were introduced until 1989, when the trophies stabilised (andreff2016tour), where rewards are given to the best overall, climber, sprinter, young rider and team. This trend can be seen on Figure 7, where riders contend for different trophies and thus, reducing the number of trophies led by the overall winner.

Nicolas Frantz and Antonin Magne have both proven that winning all 22 stages is not unfeasible in 1928 and 1934 respectively (Figure 7). However, we should note that tours are constantly changing, and compared to the earlier era, riders now spends more days on the road with less time to rest. For instance, the time spent by riders on their bikes, fell from 10–16 hours before the 1930s to 4–5 hours in the 2000s (Andreff (2016)).

Ranks of Overall Winners in Different Stages

Ranks of Overall Winners across Stages (2000 - 2019)

Figure 8: Ranks of Overall Winners across Stages (2000 - 2019)

The box plot (Figure 8) depicts the ranks of all overall winners from 2000 - 2019 for each stage. In most cases, overall winners are ranked lower than 20 and below, which may imply that the winners have a consistent pace that puts them in these ranks. Moreover, it can also be observed that overall winners usually starts out slow, averaging at a median rank of approximately 25. They usually catch up in ranks, particularly at stage 8 onwards, where they consistently below 20. Consequently, in stages 17-20, overall winners seem to pick up their pace and usually attain the top 10 rankings, implying that overall winners may start wearing the yellow jersey at this point forward. By the same token, this reiterates that the consistent pace of overall winners and may suggest that riders who exerted more energy in earlier stages are unable to keep up with the overall winners at later stages.

Why are the ranks of overall winners in stage 21 rather high and peculiar? Stage 21, or better known as the “Champs-Élysées” stage, representing a symbolic street in Paris, France. As a tradition, riders set a truce to their competition to have a celebratory moment of tranquility, chatting, and even celebrating with glasses of champagnes in the final kilometers (M (2017)). For this reason, the rankings of overall winners differ greatly in this stage.

Do Overall Winners Thrive in Certain Stage Types or Terrain?

Stages of Tour De France

Proportions of stage type for an edition (2000 - 2020)

Figure 9: Proportions of stage type for an edition (2000 - 2020)

The analysis above brings us to the question: “What makes overall champions great in spite of all the different stages?”. Despite century steeped traditions, tour organisers have arranged the racecourse in a different manner each year. Nonetheless, it is with a goal in mind, to include a balance of the different types of races. The dynamics and the different locations of each edition force riders to tailor their tactics for the different adversities in each course. In most editions, the largest proportion of the rest is mountain stages, followed by flat stages (Figure 9). In 2007, the edition only consisted of mountain stages and time trials.

Strategies of Overall Winner

Table 4: Proportion of Overall Winners Ranked 5th and Below by Stage Type
Stage Type Ranked 5th or below Stage Type Total Count Proportion
Flat 83 113 73.45%
Hilly 20 26 76.92%
Mountain 111 147 75.51%
Time trial 36 51 70.59%
Rankings of Overall Winners (2000 - 2019) by Stage Type

Figure 10: Rankings of Overall Winners (2000 - 2019) by Stage Type

Given the length of the tour and the different terrains, the fastest rider does not win Tour De France, but a more game- theoretical approach (Prinz and Wicker (2012)). Figure 10 illustrates that overall winners are versatile riders with an itinerary that necessitates different strategies for the various stage types. This is also manifest in Table 4, which establishes that overall winners are ranked 5 and below in approximately 7 in 10 races for all types of stage. Notably, the Figure (10) also reinforces the idea that overall winners do not always perform great in all stages, as seen with the variability of ranks at each type of stage.

It is almost always that the overall winners are formidable at climbing mountains. Arguably, mountain stages are the most difficult parts of the rest considering the steep climbs, thin air at high altitudes, and narrow descents in mountain stages. Having said that, these stages have the largest potential for time gaps and for riders to make the grade and set them apart from other riders. Table 4 demonstrates that overall winners thrive in mountain stages, where they are ranked 5 or less in at least three out of four mountain stages. For this reason, winners are on average, lighter cyclists who can take advantage of overcoming gravity in steep climbs with greater aerobic power (Prinz and Wicker (2012)), as found in section 2. Not to mention, Figure 10 implies that one can also expect large variability in rankings of overall winners when it comes to these grueling mountain stages.

Amongst the different stage types, rankings of overall winners had the least median rank and smallest variations when it came to time trials (Figure 10). This implies two things; First, overall winners must be physiologically skilled to win individual time trials. Individual time trials test endurance and solo strengths by forcing them to negotiate the course comprising of curved roads, small-scale slopes, and multiple terrain types while keeping a steady and appropriate pace. Unlike ephemeral sprints, time trials are more closely regarded as short marathons where riders are expected to maintain high intensities of exercise for long periods of time (Santalla et al. (2012)). On top of these factors, studies have shown that the physiological effects of heat and altitude can reduce performance by as much as 35% in isolation (Faulkner and Griggs (2016)).

Secondly, constantly winning team trial tests convey that a great team is a significant driver for winning the stage. A strategic reality is that the speediest individual will most likely not surpass a group of cyclists as the perfect sprints are often a result of seamless teamwork and aerodynamic advantages (Prinz and Wicker (2012)). This is especially complex as teams must find an ideal pace despite their diverse body types and skills. For instance, team’s domestiques (French for servants) work in unison to allow for drafting (shield from wind resistance by riders in front) to facilitate a smoother ride for the team leader (Allain (2018)). This can ensure he arrives at crucial moments of the race with little energy expended.

Unlike the constant speed in mountain stages,flat stages are associated with short bursts of high power and long periods with a reduced intensity of exercise (Vogt et al. (2007)). With flat stages accounting for nearly one third () of all stages in an edition on average, it is crucial for overall winners to be proficient at these stages. Although flat stages are tailored for sprinters and heavier riders, it is also a game of strategy. In most parts of the stage, cyclists ride close together in the peloton for favorable aerodynamic conditions by drafting. Studies have proven that this tactical behavior is especially strong with the dynamics of the peloton, where riders can save up to 30% of energy costs (Prinz and Wicker (2012)). At the final 25km, teams form breakaway groups to split from the peloton and accelerate towards the finishing line Vogt et al. (2007). Overall winners tend to be particularly good at this strategy, and are ranked ten and under for most of the flat stages (Figure 10)

Conclusion

At the conclusion of out analysis it is evident a rider must possess a range of different attributes and strategies in place in order to win the general classification. Specifically, the analysis highlights the importance of targeting appropriate stages to attack the competition, training for the right terrain and building the most efficient body to endure the gruelling 23 day tour in order to come out victorious with the general classification.

Furthermore, as tour organisers organised new trophies, the number of riders in contention of the yellow jersey have dwindled over the years as riders focuse on specific terrains, making it more difficult for overall winners to lead the stage. Each year, tour organisers vary the stages for the course, forcing riders to tailor their tactics for the different adversities in each course.

Overall winners do not necessarily win all stages, but have appeared to be consistent in ranks. In most cases, overall winners climb the ranks gradually, ranking less than 40 and stages 1-7 and less than 20 from stages 8 onwards respectively. Our analysis implied that overall winners may start wearing the yellow jersey at stage 17 onwards.

Our analysis asserts overall winners are all rounders and deploy strategic tactics. On top of their physiological traits, overall winners are usually formidable at climbing mountains. Additionally, overall winners does not triumph without his team as the speediest rider may not surpass a group of cyclists. Team allows their team leader take advantage of aerodynamics to ride at higher speeds with less energy expended, preserving his energy for critical stages.

References

“1969 Tour de France.” 2020. 2020. https://bikeraceinfo.com/tdf/tdf1969.html.

Allain, Rhett. 2018. “The Physics of Drafting in the Tour de France.” 2018. https://www.wired.com/story/the-physics-of-drafting-in-the-tour-de-france/.

Andreff, Wladimir. 2016. “The Tour de France: A Success Story in Spite of Competitive Imbalance and Doping.” In The Economics of Professional Road Cycling, 233–55. Springer.

“Cyclingrank.com.” 2020. 2020. https://www.cyclingranking.com/riders/overall.

EEB. 2020. “Tour de France.” 2020. https://www.britannica.com/sports/Tour-de-France.

Encyclopedia Britannica, Inc. 2020. “Tour de France Cycling.” 2020. https://www.britannica.com/sports/Tour-de-France.

EngberD. 2005. “How Do Cycling Teams Work?” 2005. https://slate.com/news-and-politics/2005/07/how-do-cycling-teams-work.html.

F, McKay. 2013. “A History on Blood Transfusions in Cycling, Part 3.” 2013. https://www.cyclingnews.com/features/a-history-on-blood-transfusions-in-cycling-part-3/.

Faulkner, Steve, and Katy Griggs. 2016. “How Does a Tour de France Favourite Win on the Scorching Mountain Slopes?” 2016. https://theconversation.com/how-does-a-tour-de-france-favourite-win-on-the-scorching-mountain-slopes-61316.

FotheringhamA. 2017. “How Could It Come to This? Miguel Indurain’s Fall from Grace.” https://www.cyclingnews.com/features/how-could-it-come-to-this-miguel-indurains-fall-from-grace/.

J, Wilson. 2013. “Lance Armstrong’s Doping Drugs.” 2013. https://edition.cnn.com/2013/01/15/health/armstrong-ped-explainer/index.html.

M, Mitchell. 2017. “Champs Elysees = the Most Iconic Street in Cycling.” 2017. https://www.procyclinguk.com/champs-elysees/.

M, Smith. 2016. “Https://Www.industrytap.com/Technology-Improving-Sports-Injury-Prevention-Recovery/38900.” 2016. https://www.industrytap.com/technology-improving-sports-injury-prevention-recovery/38900.

Prinz, Joachim, and Pamela Wicker. 2012. “Team and Individual Performance in the Tour de France.” Team Performance Management: An International Journal 18 (7/8): 418–32.

S, Farrand. 2020. “Bjarne Riis: If I Fail Again, Put Me in Jail.” 2020. https://www.cyclingnews.com/features/bjarne-riis-if-i-fail-again-put-me-in-jail/.

Santalla, Alfredo, Conrad P Earnest, José A Marroyo, and Alejandro Lucia. 2012. “The Tour de France: An Updated Physiological Review.” International Journal of Sports Physiology and Performance 7 (3): 200–209.

Steve, Haake J. 2009. “The Impact of Technology on Sporting Performance in Olympic Sports.” Journal of Sports Science 27 (13): 1421–31.

Vogt, Stefan, Yorck Olaf Schumacher, Andreas Blum, Kai Roecker, Hans-Hermann Dickhuth, Andreas Schmid, and Lothar Heinrich. 2007. “Cycling Power Output Produced During Flat and Mountain Stages in the Giro d’Italia: A Case Study.” Journal of Sports Sciences 25 (12): 1299–1305.

“What Makes Ineos Unbeatable?” 2019. 2019. https://www.bicycling.com/racing/a28637007/ineos-tour-de-france/.

Wikipedia contributors. 2020. “2018 Tour de France — Wikipedia, the Free Encyclopedia.” https://en.wikipedia.org/w/index.php?title=2018_Tour_de_France&oldid=960595512.

Wikipedia contributors. 2020. “2019 Tour de France — Wikipedia, the Free Encyclopedia.” https://en.wikipedia.org/w/index.php?title=2019_Tour_de_France&oldid=968463007.